IMPT of head and neck cancer: unsupervised machine learning treatment planning strategy for reducing radiation dermatitis

Radiation dermatitis is a major concern in intensity modulated proton therapy (IMPT) for head and neck cancer (HNC) despite its demonstrated superiority over contemporary photon radiotherapy. In this study, dose surface histogram data extracted from forty-four patients of HNC treated with IMPT was used to predict the normal tissue complication probability (NTCP) of skin. Grades of NTCP-skin were clustered using the K-means clustering unsupervised machine learning (ML) algorithm. A new skin-sparing IMPT (IMPT-SS) planning strategy was developed with three major changes and prospectively implemented in twenty HNC patients. Across skin surfaces exposed from 10 (S10) to 70 (S70) GyRBE, the skin's NTCP demonstrated the strongest associations with S50 and S40 GyRBE (0.95 and 0.94). The increase in the NTCP of skin per unit GyRBE is 0.568 for skin exposed to 50 GyRBE as compared to 0.418 for 40 GyRBE. Three distinct clusters were formed, with 41% of patients in G1, 32% in G2, and 27% in G3. The average (± SD) generalised equivalent uniform dose for G1, G2, and G3 clusters was 26.54 ± 6.75, 38.73 ± 1.80, and 45.67 ± 2.20 GyRBE. The corresponding NTCP (%) were 4.97 ± 5.12, 48.12 ± 12.72 and 87.28 ± 7.73 respectively. In comparison to IMPT, new IMPT-SS plans significantly (P < 0.01) reduced SX GyRBE, gEUD, and associated NTCP-skin while maintaining identical dose volume indices for target and other organs at risk. The mean NTCP-skin value for IMPT-SS was 34% lower than that of IMPT. The dose to skin in patients treated prospectively for HNC was reduced by including gEUD for an acceptable radiation dermatitis determined from the local patient population using an unsupervised MLA in the spot map optimization of a new IMPT planning technique. However, the clinical finding of acute skin toxicity must also be related to the observed reduction in skin dose.

Despite the benefits, a key concern in PBT for HNC is radiation dermatitis (RD), which is principally brought on by a decline in the peak to plateau ratio of the spreadout Bragg peak (SOBP). Patients treated with PBT for Ewing's sarcoma [8], breast [9], and brain and spine [10] have reported experiencing RD to varying degrees, ranging from grade-1 to grade-4. The quality of life (QOL) and treatment results may be significantly impacted by RD, which frequently occurs in the early stages of treatment [11][12][13].
The main factors that determine the degree of skin toxicity are the radiation dose received by the surface as measured by dose surface histograms (DSH), neoadjuvant and concurrent systemic treatment strategies, and patient and tumour characteristics [14][15][16][17][18]. The DSH/ dose volume histogram (DVH) of the skin and its correlation with clinically documented dermatitis in patients treated with photon IMRT/VMAT/Tomotherapy techniques [14][15][16][17] and passive scattering proton therapy (PSPT) [18] have been used to construct predictive models for RD. In PBT, DSH/DVH is influenced by the extent of the tumor's proximity to the skin, proton spot map optimization strategy, proton beam characteristics, and treatment delivery technique [18,19]. Arjomandy et al. 's experimental surface dose measurement study [20] verified that patients treated with the spot scanning technique (SST) received less exposure to the skin than those treated with PSPT. To the best of our knowledge, there is no study on the association between DSH/DVH and the severity of skin toxicity in HNC patients receiving IMPT. In addition, despite skin toxicity being acknowledged as a clinically important problem, no treatment planning strategy to prevent acute RD has been identified.
In this study, RD was predicted using DSH data collected from a group of HNC patients who had received IMPT. Unsupervised K-means clustering machine learning (ML) algorithm [21] was developed to group patients according to the likelihood that they will develop high, medium, or low grades of RD. In an attempt to reduce RD, a new IMPT optimization technique was developed that incorporates predictive model findings. Finally, the anticipated RD resulting from the new planning strategy was contrasted with the RD of initial patients treated using the old planning strategy.

Materials and methods
In this study, 64 HNC patients (Men:Female = 52:12; median age, 50 years; range, 21 to 80 years) with various histology who received IMPT were included. Each patient received simultaneous integrated boost (SIB) dosage to their primary tumour and bilateral neck nodes from the IMPT plan. In 19 (29.7%) patients treated for definitive settings, an SIB dose regimen of 70 (primary)/56 (nodes) Gy RBE were delivered over 35 fractions while in the remaining 45 (70.3%) post operative patients 60 (primary)/54 (nodes) Gy RBE were delivered in 30 fractions to the tumour bed. Of the 64 patients, the first 44 patients were used for the NTCP model creations and clustering of grades of NTCP using unsupervised K-means clustering ML algorithm. Whereas, the subsequent 20 patients were prospectively enrolled for a new skin sparing IMPT (IMPT-SS) treatment plan and predicted grades of RD were compared with the original IMPT planning technique. The percentage of patients treated with the two SIB dose regimen were similar in the group of patients enrolled for NTCP modelling and IMPT-SS.

Original IMPT plan
The RayStation (RaySearch Laboratory, Sweden) treatment planning system (TPS, v9.0), which was clinically commissioned for the ProteusPlus (IBA, Luvalleno) proton therapy system (PTS) outfitted with a dedicated PBS nozzle, was used to optimize the IMPT plans. It can modulate proton energy from 70.2 to 226.2 MeV (range 4.1-32 g/cm 2 ) with a corresponding spot size of 6.7 to 3 mm sigma in air and at the isocenter. The range of the proton beam can be reduced further to treat superficial tumour using an add-on Lexan (density = 1.20 g/ cm 2 ) range shifter (RS) having a physical dimension of 32 × 40 × 3 cm 3 .
The ProteusPlus proton beam characteristics data used in this study are detailed elsewhere [22,23]. Each treatment plan was created using four fields: two post obliques (120° ± 10°and 240° ± 10°) with the patient positioning system (PPS) angle inclined at ±30° and two anterior obliques (50° ± 5° and 310° ± 5°) with the PPS angle set at 0°. Laterally placed targets were treated by a pair of ipsilateral oblique (anterior and posterior) beams, whereas centrally placed primary CTVs were treated by all four beams. In every field, a rectangular range shifter (Lexan; density = 1.25 g/cm 2 , physical thickness = 3 cm) mounted on the PBS dedicated nozzle was employed, with its outside surface positioned 3 to 5 cm from the most extended body surface. The clinical target volumes (CTVs) for each plan were robustly optimised to account for set-up errors of ± 3 mm and range uncertainties of ± 3.5%. The Ray-Station TPS's Monte-Carlo (MC) algorithm was always employed for spot map optimization and volumetric dosage calculation. A statistical dose calculation uncertainty of 1% was utilised with a sampling history of 10,000 ions/ spot. A planning physicist, an independent physicist, and the clinical site-specific primary physician performed crucial evaluations of treatment plan on every patient. Cone-beam computed tomography (CBCT) volumetric image guidance was used for the daily verification of patient set-up and monitoring of any internal or external changes in patient anatomy.

Skin toxicity model
The treatment plan databases of the first 44 patients were retrospectively examined to estimate DSH provided by Where, skin r is the 3 mm cutaneous layer of skin created automatically using an in-house developed code from the body contours of the irradiated area. Body contour was constructed accurately as the external surface of the patient's CT-images excluding the thermoplastic and other immobilization devices. Irradiated area in this manuscript is the section of body contour receiving at least 10% of the prescription dose. In order to extract the DSH of the 3 mm irradiated cutaneous layers from the treatment plan, a second custom script was developed in the python scripting environment offered at RayStation TPS. An open-source program; Computational Environment for Radiation Therapy (CERR) was used to translate Digital Imaging and Communication in Medicine for Radiotherapy (DICOM RT) plans that also contained DSH into Matlab-readable format (MathWorks, Natick, MA, USA). The percentage of surface (S) receiving doses of X GyRBE (S x ), such as S10 GyRBE, S20 GyRBE, S30 GyRBE, S40 GyRBE, S50 GyRBE, S60 GyRBE, and S70 GyRBE respectively, was used to access the DSH of the skin in the various grades of RD. Re-defined generalised equivalent uniform dose (gEUD) was calculated for each patient with a known RD grade by The normal tissue complication probability (NTCP) of skin was estimated using DSH-based Lyman-Kutcher-Burman (LKB) model represented by where The specifics of the new formalism of DSH applied to re-defined generalised equivalent uniform dose (gEUD) for the LKB modelling of skin toxicity were described elsewhere [15]. A Matlab programme was created to determine the NTCP according to G plama et al. 's [15] (1) description of skin toxicity. It was decided, by consensus, to investigate the possibility of including anticipated skin toxicity in the treatment planning process. This was aimed at further utilisation of IMPT capabilities and the development of an optimization strategy for lowering skin toxicity, without lowering the recommended dose to the tumour.

K-means clustering
In order to avoid the inter-patient variability in assessing the grade of toxicity, an unsupervised ML algorithm, K-means clustering, was used to create the clusters of patients. In K-means, the unlabelled dataset is split up into several groups by clustering. We have used the elbow method to find the number of clusters within our patient data set. The goal of this approach is to cluster the data points in a way that minimises the sum of the squared distances between the data points and the centroid (H) represented by where y h i is the data within the cluster c h The 44 patients were clustered based on the NTCP and EUD, which divides the unlabelled dataset into various clustered groups. These clusters were used to find the range of DSH, EUD NTCP of skin toxicities of various grades.

New skin sparing IMPT (IMPT-SS)
Three significant adjustments were made in order to create a new IMPT treatment planning technique. First, whenever the CTV extend up to the body contour, a new CTV (CTV Eval) was created by subtracting 3 mm from the body contour. Then, while concurrently guaranteeing that CTV 95 ≥ 95% and CTV Eval 98 ≥ 98%, robust optimization criteria for set-up error and range uncertainty were applied to CTV Eval. In addition, the new optimization procedure used the mean EUD and DSH calculated for cluster groups from the initial 44 patients for a clinically tolerable skin toxicity, and the trade-off between CTV coverage and DSH was determined using the pareto function of the multi-criteria-optimization (MCO) method [24]. The elimination of the range shifter and setting the PPS angle on the two posterior oblique fields to zero degrees constituted the third adjustment to the new planning method. This method of planning will be referred to as skin sparing IMPT (IMPT-SS) from now on. This strategy was implemented prospectively in the planning of 20 HNC patients.
Dose received by each x% volume (Dx%) of the CTVs and OARs was used to compare the original (IMPT) and new skin sparing (IMPT-SS) treatment plans, and DSH and NTCP were used to assess the skin doses. For CTVs, D98%, D95%, and D1% were evaluated, whereas D1% and Dmean were evaluated for relevent OARs. Wilcoxon signed-rank tests were used to analyse the statistical differences between the two planning methodologies and the relationships between DSH, EUD, and NTCP parameters.

Analysis of DSH and clustering
The calculated NTCP and the skin surface area exposed to various radiation doses (Sx) showed a high association over the whole dose range, based on the correlation heat map. The S50 GyRBE and S40 GyRBE have the strongest correlations (0.95 and 0.94, respectively) with NTCP of skin. As shown in Fig. 1a and b, the NTCP increases nearly linearly with the percentage of skin surface area exposed to 50 Gy RBE and 40 Gy RBE, respectively. As anticipated, the increase in NTCP from 50 Gy RBE was greater than that from 40 Gy RBE for the same surface area of exposure. According to estimates, the rise in skin's NTCP per unit Gy RBE is 0.568 for skin exposed to 50 Gy RBE as opposed to 0.418 for skin exposed to 40 Gy RBE. The pair plot representation of the three clusters, group-1 (G1), group-2 (G2), and group-3 (G3), formed by the elbo technique used in the K-means clustering algorithm with gEUD, NTCP (%), S40 GyRBE, and S50 GyRBE, respectively, is shown in Fig. 2. These groups (G1, G2 and G3) are not the same as CTCAE grading system. The findings show a distinct cluster of patients with different NTCPs, with G1 showing a lower NTCP, G2 exhibiting a medium NTCP, and G3 exhibiting a greater NTCP of skin. There were 12 (27%) patients in G3, 14 (32%) patients in G2, and 18 (41%) patients in G1 clusters. Table 1 lists the mean ± SD of the DSH indices of three clusters. For G1, G2, and G3 groups, the percentage of surface area (mean ± SD) receiving 50 Gy RBE was 2.34 ± 3.17, 19.88 ± 5.77, and 36.56 ± 9.43%, respectively. Figure 3 displays the box plot of the EUD (Fig. 3a) and the best fit DSH-based LKB model (Fig. 3b) for the G1, G2, and G3 clusters calculated from the initial 44 patients. The average (± SD) gEUD was 26.54 ± 6.75 Gy RBE for G1, 38.73 ± 1.80 Gy RBE for G2 and 45.67 ± 2.20 Gy RBE for G3 clusters. The corresponding NTCP (%) for G1, G2 and G3 groups were 4.97 ± 5.12, 48.12 ± 12.72 and 87.28 ± 7.73 respectively.

Analysis of skin sparing planning technique
The new planning strategy IMPT-SS, when applied prospectively to the 20 patients, met clinical objectives and produced a dose distribution that was nearly identical to that of the original plans, with the exception of skin dose. The comparison of the dose distribution and dosage difference of two planning strategies is shown in Fig. 4. Although both planning procedures cover the same target area, a considerable dosage differential was seen close to the skin. The comparative dosimetric parameters of targets and OARs were displayed in Tables 2 and 3. Table 2 demonstrates that there is no statistically significant difference between the two planning approaches in the target coverage for any of the CTVs. The D98% for CTV eval was above 98% of the dosage in both groups, and the D 95% for CTVs was ≥ 95% of the prescribed dose. Similarly, both strategies were successful in achieving comparable OAR sparing, with the exception skin (P < 0.001), where the new strategy produced a much lower dose (Table 3).
IMPT-SS significantly (P < 0.01) reduced the SX GyRBE, gEUD, and associated NTCP, as indicated in Table 4. The mean NTCP values for IMPT-SS were 34% lower than from IMPT. In the IMPT and IMPT-SS plans, the mean ± SD of NTCP was 76.97 ± 26.83% and 42.73 ± 24.55%, respectively. The Wilcoxon signed-rank test's corresponding p values are shown in Table 4 together with the mean and standard deviation for the radiobiological and DSH indices. Both at higher and lower dosages, the IMPT-SS approach shows significantly less dose to the skin. In the IMPT technique, the median result for S20 GyRBE was 84.0% (range: 49-100%), but in the IMPT-SS approach, it was 68% (range: 22-98.4%). Similarly, the median value for IMPT and IMPT-SS plans was 38% (range: 0-87%) and 17% (range: 0-37%), respectively.

Discussion
Utilizing dosimetric data gathered from our own patient group who were treated for the HNC using the IMPT approach, we were able to establish a relationship between the dose and exposure to skin surface area for varying NTCP of skin. We chose to use the unsupervised k-means clustering machine learning algorithm to classify three group of skin toxicities: G1, G2 and G3.  Table 1 The overall mean and standard deviation (SD) of the percentage of skin surface receiving X GyRBE (Sx GyRBE) for three clusters (G1, G2 and G3) of patients The groups of skin toxicity referred in this study is not the same as the CTCAE grading system. This approach could potentially avoid the variability in the reporting of clinically reported skin toxicities caused by variations in patient and tumor characteristics, concurrent systemic and neoadjuvant treatment regimens, and physician  discrepancies. A significant association was found between the percentage of skin surface area exposed to various doses of radiation and observed skin toxicity. The S50 GyRBE was shown to be the most effective DSH indicator for predicting radiation dermatitis, with a mean surface area of 35.24%, correlating well with G3 and 19.7% with G2 patients. In a comparative examination of HNC patients treated with IMRT/VMAT, Kawamura et al. [17] found that V60 Gy was the most important predictive indicator, with V60Gy > 38 cm 3 being associated with 43.44% of grade-3 dermatitis. The lack of dose buildup characteristics in protons as compared to photons may be the cause of the lower dose limit for the manifestation of higher NTCP of skin.
Although previous studies using photon RT techniques have examined the DSH/DVH and its relationship to the level of skin toxicity and related NTCP, a comparable study using IMPT has not been done. Despite the complicated process involved, we chose DSH over DVH because, in our opinion, DSH represents a clinically Table 2 The overall mean and standard deviation (SD) dosimetry parameters (D98%, D95%, and D1%) of the CTVs in two treatment planning approaches (IMPT and IMPT-SS), along with the P values of the wilcoxon signed rank test P < 0.05 is considered statistically significant

Target
Mean ± SD of D98% from P value Mean ± SD of D95% from P value Mean ± SD of D1%  Table 3 The overall mean and standard deviation (SD) dosimetry parameters (D1%, Dmean) of the OARs in two treatment planning approaches (IMPT and IMPT-SS), along with the P values of the wilcoxon signed rank test  realistic natural tool for understanding radiation-induced skin toxicity. Interested readers may refer publication by Palma G et al. [15] for elaborate discussion on the use of DSH over DVH for skin NTCP modelling. Skin toxicity in IMPT is influenced by a number of treatment planning parameters. The size of the tumour in the beam direction, the planning approach for optimization, the spot energy of the irradiation fields, and the spot map optimization technique. It greatly depends on how dose gradients between the skin and the target are optimised. The posterior non-coplanar beam arrangement was used in the original IMPT plans to enable positioning of the range shifter (35 × 35 cm 2 ) from posterior oblique fields closure to patient's body surface without colliding with the patient or PPS. However, in the new IMPT-SS plans, the PPS rotation was avoided to simplify the treatment planning and delivery process. This necessitates removal of range shifter from the posterior oblique beams. The underdose to sallower section of tumour caused by the removal of range shifter from posterior oblique beams are compensated from the anterior oblique beams of the same side used with range shifter. Using the typical gEUD determined from our own patients for a clinically acceptable degree of skin toxicity, we investigated the potential of spot map optimization to lower skin dose. CTVeval was developed as part of the new treatment planning methodology so that clinical goals are not overly optimised during the robust optimization.
The dose delivered to the CTV and OARs is less affected by set up and range uncertainties while using this method. However, depending on the extent of the uncertainties taken into consideration, it is anticipated that robust treatment planning will lead to greater doses to nearby normal tissues in order to provide acceptable CTV coverage in all robustly optimised error situations [25,26]. A higher degree of robustness often leads to higher, and occasionally noticeably larger, OAR dosages, especially for setup robustness in HNC. A differential robustness technique was incorporated in our novel skin sparing planning strategy to prevent overdose clouds around the CTV and skin. The inclusion of the average EUD and prediction power of the NTCP of skin acquired from our own patient cohort was the third significant adjustment made in the new planning technique. Since multiple iteration takes a lot of time, MCO was used to create the ability to trade between potential solutions for skin and CTV coverage. The considerable reduction in skin toxicity grade when compared to initial patients treated with the original planning strategy was highly linked with the dosimetric gain in terms of reduction in gEUD and accompanying NTCP. The co-relation between the findings from this predictive model and clinically recorded skin toxicity data will be presented separately.

Conclusion
It has been established that the k-means unsupervised machine learning algorithm can be used to cluster predicted degrees of skin toxicity based on dose surface histograms produced from patients who underwent IMPT. A skin sparing IMPT treatment planning strategy has been proposed, incorporating three major changes during the robust optimization of CTV. One of the three modifications that significantly lowers skin toxicity is the inclusion of the gEUD derived from the studied patient group for a clinically acceptable grade of skin toxicity into the proton spot map optimization.